CSC 580 Default Final Project -- Highway with SB3's PPO

Course/Section: CSC 580 AI 2 Final Project

Assignment Name: Reinforcement Learning Analysis in Highway Environments

Group: 40 | Names: Om Prakash Gunja (2131025), Raju Meesala ( 2119844 )

# ## Code piece to mount my Google Drive
# from google.colab import drive
# drive.mount("/content/drive") # my Google Drive root directory will be mapped here
# Change the working directory to your own work directory (where the code file is).
import os
thisdir = '/Users/omprakashgunja/Documents/Classes/Winter 2025/AI 2.0/CSC580_Winter2025/Final Project/Finals/PPO_experiments'
os.chdir(thisdir)

# Ensure the files are there (in the folder)
!pwd
/Users/omprakashgunja/Documents/Classes/Winter 2025/AI 2.0/CSC580_Winter2025/Final Project/Finals/PPO_experiments

1. Install necessary libraries and set up things

# Install environment and agent
%pip install highway-env
# NOTE: we use the bleeding edge version of stable_baseline3 because the current
# stable version does not support the latest gym>=0.21 versions. If necessary,
# revert back to stable at the next SB3 release.
%pip install git+https://github.com/DLR-RM/stable-baselines3 2>/dev/null #1>&2
%pip install stable-baselines3
Requirement already satisfied: highway-env in /Users/omprakashgunja/miniforge3/envs/ml-mps/lib/python3.10/site-packages (1.10.1)
Requirement already satisfied: gymnasium>=1.0.0a2 in /Users/omprakashgunja/miniforge3/envs/ml-mps/lib/python3.10/site-packages (from highway-env) (1.0.0)
Requirement already satisfied: farama-notifications>=0.0.1 in /Users/omprakashgunja/miniforge3/envs/ml-mps/lib/python3.10/site-packages (from highway-env) (0.0.4)
Requirement already satisfied: numpy>=1.21.0 in /Users/omprakashgunja/miniforge3/envs/ml-mps/lib/python3.10/site-packages (from highway-env) (2.0.2)
Requirement already satisfied: pygame>=2.0.2 in /Users/omprakashgunja/miniforge3/envs/ml-mps/lib/python3.10/site-packages (from highway-env) (2.6.1)
Requirement already satisfied: matplotlib in /Users/omprakashgunja/miniforge3/envs/ml-mps/lib/python3.10/site-packages (from highway-env) (3.9.2)
Requirement already satisfied: pandas in /Users/omprakashgunja/miniforge3/envs/ml-mps/lib/python3.10/site-packages (from highway-env) (2.2.3)
Requirement already satisfied: scipy in /Users/omprakashgunja/miniforge3/envs/ml-mps/lib/python3.10/site-packages (from highway-env) (1.15.1)
Requirement already satisfied: cloudpickle>=1.2.0 in /Users/omprakashgunja/miniforge3/envs/ml-mps/lib/python3.10/site-packages (from gymnasium>=1.0.0a2->highway-env) (3.1.1)
Requirement already satisfied: typing-extensions>=4.3.0 in /Users/omprakashgunja/miniforge3/envs/ml-mps/lib/python3.10/site-packages (from gymnasium>=1.0.0a2->highway-env) (4.12.2)
Requirement already satisfied: contourpy>=1.0.1 in /Users/omprakashgunja/miniforge3/envs/ml-mps/lib/python3.10/site-packages (from matplotlib->highway-env) (1.3.0)
Requirement already satisfied: cycler>=0.10 in /Users/omprakashgunja/miniforge3/envs/ml-mps/lib/python3.10/site-packages (from matplotlib->highway-env) (0.12.1)
Requirement already satisfied: fonttools>=4.22.0 in /Users/omprakashgunja/miniforge3/envs/ml-mps/lib/python3.10/site-packages (from matplotlib->highway-env) (4.54.1)
Requirement already satisfied: kiwisolver>=1.3.1 in /Users/omprakashgunja/miniforge3/envs/ml-mps/lib/python3.10/site-packages (from matplotlib->highway-env) (1.4.7)
Requirement already satisfied: packaging>=20.0 in /Users/omprakashgunja/miniforge3/envs/ml-mps/lib/python3.10/site-packages (from matplotlib->highway-env) (24.1)
Requirement already satisfied: pillow>=8 in /Users/omprakashgunja/miniforge3/envs/ml-mps/lib/python3.10/site-packages (from matplotlib->highway-env) (10.4.0)
Requirement already satisfied: pyparsing>=2.3.1 in /Users/omprakashgunja/miniforge3/envs/ml-mps/lib/python3.10/site-packages (from matplotlib->highway-env) (3.2.0)
Requirement already satisfied: python-dateutil>=2.7 in /Users/omprakashgunja/miniforge3/envs/ml-mps/lib/python3.10/site-packages (from matplotlib->highway-env) (2.9.0)
Requirement already satisfied: pytz>=2020.1 in /Users/omprakashgunja/miniforge3/envs/ml-mps/lib/python3.10/site-packages (from pandas->highway-env) (2024.2)
Requirement already satisfied: tzdata>=2022.7 in /Users/omprakashgunja/miniforge3/envs/ml-mps/lib/python3.10/site-packages (from pandas->highway-env) (2024.2)
Requirement already satisfied: six>=1.5 in /Users/omprakashgunja/miniforge3/envs/ml-mps/lib/python3.10/site-packages (from python-dateutil>=2.7->matplotlib->highway-env) (1.16.0)
Note: you may need to restart the kernel to use updated packages.
Collecting git+https://github.com/DLR-RM/stable-baselines3
  Cloning https://github.com/DLR-RM/stable-baselines3 to /private/var/folders/lh/w_mb55fn49v3tqcf3drw5wqc0000gn/T/pip-req-build-x8iraycm
Note: you may need to restart the kernel to use updated packages.
Requirement already satisfied: stable-baselines3 in /Users/omprakashgunja/miniforge3/envs/ml-mps/lib/python3.10/site-packages (2.5.0)
Requirement already satisfied: gymnasium<1.1.0,>=0.29.1 in /Users/omprakashgunja/miniforge3/envs/ml-mps/lib/python3.10/site-packages (from stable-baselines3) (1.0.0)
Requirement already satisfied: numpy<3.0,>=1.20 in /Users/omprakashgunja/miniforge3/envs/ml-mps/lib/python3.10/site-packages (from stable-baselines3) (2.0.2)
Requirement already satisfied: torch<3.0,>=2.3 in /Users/omprakashgunja/miniforge3/envs/ml-mps/lib/python3.10/site-packages (from stable-baselines3) (2.6.0)
Requirement already satisfied: cloudpickle in /Users/omprakashgunja/miniforge3/envs/ml-mps/lib/python3.10/site-packages (from stable-baselines3) (3.1.1)
Requirement already satisfied: pandas in /Users/omprakashgunja/miniforge3/envs/ml-mps/lib/python3.10/site-packages (from stable-baselines3) (2.2.3)
Requirement already satisfied: matplotlib in /Users/omprakashgunja/miniforge3/envs/ml-mps/lib/python3.10/site-packages (from stable-baselines3) (3.9.2)
Requirement already satisfied: typing-extensions>=4.3.0 in /Users/omprakashgunja/miniforge3/envs/ml-mps/lib/python3.10/site-packages (from gymnasium<1.1.0,>=0.29.1->stable-baselines3) (4.12.2)
Requirement already satisfied: farama-notifications>=0.0.1 in /Users/omprakashgunja/miniforge3/envs/ml-mps/lib/python3.10/site-packages (from gymnasium<1.1.0,>=0.29.1->stable-baselines3) (0.0.4)
Requirement already satisfied: filelock in /Users/omprakashgunja/miniforge3/envs/ml-mps/lib/python3.10/site-packages (from torch<3.0,>=2.3->stable-baselines3) (3.16.1)
Requirement already satisfied: networkx in /Users/omprakashgunja/miniforge3/envs/ml-mps/lib/python3.10/site-packages (from torch<3.0,>=2.3->stable-baselines3) (3.4.2)
Requirement already satisfied: jinja2 in /Users/omprakashgunja/miniforge3/envs/ml-mps/lib/python3.10/site-packages (from torch<3.0,>=2.3->stable-baselines3) (3.1.4)
Requirement already satisfied: fsspec in /Users/omprakashgunja/miniforge3/envs/ml-mps/lib/python3.10/site-packages (from torch<3.0,>=2.3->stable-baselines3) (2024.9.0)
Requirement already satisfied: sympy==1.13.1 in /Users/omprakashgunja/miniforge3/envs/ml-mps/lib/python3.10/site-packages (from torch<3.0,>=2.3->stable-baselines3) (1.13.1)
Requirement already satisfied: mpmath<1.4,>=1.1.0 in /Users/omprakashgunja/miniforge3/envs/ml-mps/lib/python3.10/site-packages (from sympy==1.13.1->torch<3.0,>=2.3->stable-baselines3) (1.3.0)
Requirement already satisfied: contourpy>=1.0.1 in /Users/omprakashgunja/miniforge3/envs/ml-mps/lib/python3.10/site-packages (from matplotlib->stable-baselines3) (1.3.0)
Requirement already satisfied: cycler>=0.10 in /Users/omprakashgunja/miniforge3/envs/ml-mps/lib/python3.10/site-packages (from matplotlib->stable-baselines3) (0.12.1)
Requirement already satisfied: fonttools>=4.22.0 in /Users/omprakashgunja/miniforge3/envs/ml-mps/lib/python3.10/site-packages (from matplotlib->stable-baselines3) (4.54.1)
Requirement already satisfied: kiwisolver>=1.3.1 in /Users/omprakashgunja/miniforge3/envs/ml-mps/lib/python3.10/site-packages (from matplotlib->stable-baselines3) (1.4.7)
Requirement already satisfied: packaging>=20.0 in /Users/omprakashgunja/miniforge3/envs/ml-mps/lib/python3.10/site-packages (from matplotlib->stable-baselines3) (24.1)
Requirement already satisfied: pillow>=8 in /Users/omprakashgunja/miniforge3/envs/ml-mps/lib/python3.10/site-packages (from matplotlib->stable-baselines3) (10.4.0)
Requirement already satisfied: pyparsing>=2.3.1 in /Users/omprakashgunja/miniforge3/envs/ml-mps/lib/python3.10/site-packages (from matplotlib->stable-baselines3) (3.2.0)
Requirement already satisfied: python-dateutil>=2.7 in /Users/omprakashgunja/miniforge3/envs/ml-mps/lib/python3.10/site-packages (from matplotlib->stable-baselines3) (2.9.0)
Requirement already satisfied: pytz>=2020.1 in /Users/omprakashgunja/miniforge3/envs/ml-mps/lib/python3.10/site-packages (from pandas->stable-baselines3) (2024.2)
Requirement already satisfied: tzdata>=2022.7 in /Users/omprakashgunja/miniforge3/envs/ml-mps/lib/python3.10/site-packages (from pandas->stable-baselines3) (2024.2)
Requirement already satisfied: six>=1.5 in /Users/omprakashgunja/miniforge3/envs/ml-mps/lib/python3.10/site-packages (from python-dateutil>=2.7->matplotlib->stable-baselines3) (1.16.0)
Requirement already satisfied: MarkupSafe>=2.0 in /Users/omprakashgunja/miniforge3/envs/ml-mps/lib/python3.10/site-packages (from jinja2->torch<3.0,>=2.3->stable-baselines3) (3.0.2)
Note: you may need to restart the kernel to use updated packages.

# Environment
import gymnasium as gym   # Note: gymnasium is already installled in Colab
import highway_env        # noqa: F401
from gymnasium.wrappers import RecordVideo

gym.register_envs(highway_env) # register the environment -- maybe helpful...

# Agent
from stable_baselines3 import DQN  # add more if you want to experiment with others

# Visualization utils -- including tensorboard
%load_ext tensorboard

import sys
from tqdm.notebook import trange
%pip install tensorboardx gym pyvirtualdisplay
%pip install tensorboard
# %apt-get install -y xvfb ffmpeg
The tensorboard extension is already loaded. To reload it, use:
  %reload_ext tensorboard
Requirement already satisfied: tensorboardx in /Users/omprakashgunja/miniforge3/envs/ml-mps/lib/python3.10/site-packages (2.6.2.2)
Requirement already satisfied: gym in /Users/omprakashgunja/miniforge3/envs/ml-mps/lib/python3.10/site-packages (0.26.2)
Requirement already satisfied: pyvirtualdisplay in /Users/omprakashgunja/miniforge3/envs/ml-mps/lib/python3.10/site-packages (3.0)
Requirement already satisfied: numpy in /Users/omprakashgunja/miniforge3/envs/ml-mps/lib/python3.10/site-packages (from tensorboardx) (2.0.2)
Requirement already satisfied: packaging in /Users/omprakashgunja/miniforge3/envs/ml-mps/lib/python3.10/site-packages (from tensorboardx) (24.1)
Requirement already satisfied: protobuf>=3.20 in /Users/omprakashgunja/miniforge3/envs/ml-mps/lib/python3.10/site-packages (from tensorboardx) (5.29.3)
Requirement already satisfied: cloudpickle>=1.2.0 in /Users/omprakashgunja/miniforge3/envs/ml-mps/lib/python3.10/site-packages (from gym) (3.1.1)
Requirement already satisfied: gym_notices>=0.0.4 in /Users/omprakashgunja/miniforge3/envs/ml-mps/lib/python3.10/site-packages (from gym) (0.0.8)
Note: you may need to restart the kernel to use updated packages.
Requirement already satisfied: tensorboard in /Users/omprakashgunja/miniforge3/envs/ml-mps/lib/python3.10/site-packages (2.18.0)
Requirement already satisfied: absl-py>=0.4 in /Users/omprakashgunja/miniforge3/envs/ml-mps/lib/python3.10/site-packages (from tensorboard) (2.1.0)
Requirement already satisfied: grpcio>=1.48.2 in /Users/omprakashgunja/miniforge3/envs/ml-mps/lib/python3.10/site-packages (from tensorboard) (1.70.0)
Requirement already satisfied: markdown>=2.6.8 in /Users/omprakashgunja/miniforge3/envs/ml-mps/lib/python3.10/site-packages (from tensorboard) (3.7)
Requirement already satisfied: numpy>=1.12.0 in /Users/omprakashgunja/miniforge3/envs/ml-mps/lib/python3.10/site-packages (from tensorboard) (2.0.2)
Requirement already satisfied: packaging in /Users/omprakashgunja/miniforge3/envs/ml-mps/lib/python3.10/site-packages (from tensorboard) (24.1)
Requirement already satisfied: protobuf!=4.24.0,>=3.19.6 in /Users/omprakashgunja/miniforge3/envs/ml-mps/lib/python3.10/site-packages (from tensorboard) (5.29.3)
Requirement already satisfied: setuptools>=41.0.0 in /Users/omprakashgunja/miniforge3/envs/ml-mps/lib/python3.10/site-packages (from tensorboard) (75.3.0)
Requirement already satisfied: six>1.9 in /Users/omprakashgunja/miniforge3/envs/ml-mps/lib/python3.10/site-packages (from tensorboard) (1.16.0)
Requirement already satisfied: tensorboard-data-server<0.8.0,>=0.7.0 in /Users/omprakashgunja/miniforge3/envs/ml-mps/lib/python3.10/site-packages (from tensorboard) (0.7.2)
Requirement already satisfied: werkzeug>=1.0.1 in /Users/omprakashgunja/miniforge3/envs/ml-mps/lib/python3.10/site-packages (from tensorboard) (3.1.3)
Requirement already satisfied: MarkupSafe>=2.1.1 in /Users/omprakashgunja/miniforge3/envs/ml-mps/lib/python3.10/site-packages (from werkzeug>=1.0.1->tensorboard) (3.0.2)
Note: you may need to restart the kernel to use updated packages.
def run_record(vdir, modelPath, vdofileName, env, forHowLong, freq):# Run the trained model and record video
    video_dir = vdir # './videos/highway/DQN'
    if not os.path.exists(video_dir):
        os.makedirs(video_dir)

    # Enable Path Projection
    env.unwrapped.config["show_trajectories"] = True  #  Enable trajectories
    env.unwrapped.config["simulation_frequency"] = freq  # Higher FPS for rendering
    env.unwrapped.config["policy_frequency"] = 5
    env.unwrapped.configure(env.unwrapped.config)  # Apply the updated configuration
    
    
    # env = RecordVideo(env, video_folder=video_dir) # wrap env in video recording

    # del model  # delete trained model to start new for testing

    # Assuming your model is saved as 'highway_dqn/model.zip'
    # Update this path if it's named differently
    saved_model = modelPath #'../saved_models/DQN/dqn_highway_hyper3.zip' #"saved_model_dir #"./highway_dqn/model.zip"

    model = DQN.load(saved_model, env=env, print_system_info=False)
    #model = DQN.load(saved_model_dir, env=env)

    # Remove the RecordVideo wrapper as it's incompatible with highway-env
    # Instead, manually save frames and create a video using moviepy
    import moviepy as mpy
    from PIL import Image
    import numpy as np

    
    frames = []
    for videos in range(forHowLong): # change 10 (num of episodes) to whatever you like
        done = truncated = False
        obs, info = env.reset()
        while not (done or truncated):
            # Predict
            action, _states = model.predict(obs)
            # Get reward
            obs, reward, done, truncated, info = env.step(action)
            # print(info)
            # Render and append frame to list
            #frames.append(Image.fromarray(env.render())) # old code.. not compatible
            frames.append(np.array(env.render()))

    env.close()
    # Save the frames as a video
    clip = mpy.ImageSequenceClip(frames, fps=freq)

    videofile = video_dir + vdofileName #"/highway_hyper3_video.mp4"
    clip.write_videofile(videofile)
import glob
import io
from IPython import display as ipythondisplay
from IPython.display import HTML
import base64

def show_video(videofile):
  mp4list = glob.glob(videofile)
  if len(mp4list) > 0:
    mp4 = mp4list[0]
    video = io.open(mp4, 'r+b').read()
    encoded = base64.b64encode(video)
    ipythondisplay.display(HTML(data='''<video alt="test" autoplay
                loop controls style="height: 300px;">
                <source src="data:video/mp4;base64,{0}" type="video/mp4" />
             </video>'''.format(encoded.decode('ascii'))))
  else:
    print("Could not find video")

2. Training

First set up tensorboard locally to visualize training.

# set the tensorboard log directory
tb_log_dir = "./tensorboard_logs"
if not os.path.exists(tb_log_dir):
    os.makedirs(tb_log_dir)

# %load_ext tensorboard
# %tensorboard --logdir tb_log_dir
 %reload_ext tensorboard

Create an environment and model -- and train!!!

# # create an environment
# env = gym.make("highway-fast-v0", render_mode="rgb_array"
# obs, info = env.reset()

from stable_baselines3.common.env_util import make_vec_env
from stable_baselines3.common.monitor import Monitor
from stable_baselines3.common.callbacks import CheckpointCallback
from stable_baselines3.common.evaluation import evaluate_policy

Trying out the baseline model

Hypothesis : with the baseline model I think it's going to train almost nothing because the first thing the learning rate seems to be really small,
import gymnasium as gym
from stable_baselines3 import PPO
import torch.nn as nn

# Create the environment
env = gym.make("highway-fast-v0", render_mode="rgb_array")
obs, info = env.reset()

policy_kwargs = dict(
    net_arch=dict(pi=[256, 256], vf=[256, 256])  # Custom network architecture
)

# Create the PPO model with tuning parameters and custom network architecture
model = PPO(
    "MlpPolicy",
    'highway-fast-v0',
    learning_rate=3e-4,          # Base learning rate
    n_steps=2048,                # Number of steps to run for each environment per update
    batch_size=64,               # Batch size for each update
    n_epochs=10,                 # Number of epochs when optimizing the surrogate loss
    gamma=0.99,                  # Discount factor
    gae_lambda=0.95,             # Lambda for Generalized Advantage Estimation
    clip_range=0.2,              # Clipping parameter for the surrogate loss
    ent_coef=0.0,                # Coefficient for the entropy loss term
    vf_coef=0.7,                 # Value function loss coefficient
    verbose=0,
    tensorboard_log=tb_log_dir,  # Tensorboard log directory for tuning analysis
    policy_kwargs=policy_kwargs  # Pass custom policy keyword arguments
)

# Train the model for a reduced number of timesteps for the preliminary experiment
model.learn(total_timesteps=int(2e4))
<stable_baselines3.ppo.ppo.PPO at 0x369459030>
# save the model in the specified directory
saved_model_dir = "../saved_models/PPO"
if not os.path.exists(saved_model_dir):
    os.makedirs(saved_model_dir)

# Save model with custom name
model.save(f"{saved_model_dir}/ppo_highway_base.zip")
 %reload_ext tensorboard
%load_ext tensorboard
%tensorboard --logdir './tensorboard_logs'
The tensorboard extension is already loaded. To reload it, use:
  %reload_ext tensorboard
Reusing TensorBoard on port 6010 (pid 54023), started 0:02:42 ago. (Use '!kill 54023' to kill it.)
# Evaluate the agent
from stable_baselines3.common.evaluation import evaluate_policy

# NOTE: If you use wrappers with your environment that modify rewards,
#       this will be reflected here. To evaluate with original rewards,
#       wrap environment in a "Monitor" wrapper before other wrappers.
mean_reward, std_reward = evaluate_policy(model, model.get_env(), n_eval_episodes=10)
print (f'{mean_reward}, {std_reward}')
20.920220999999998, 0.9433981132056604
import gymnasium as gym
vdoFile = "/ppo_highway_base.mp4"
modelPath = '../saved_models/PPO/ppo_highway_base.zip'
vdir = './videos/highway/PPO'
# Create a new environment for testing
env.reset()
env = gym.make("highway-v0", render_mode="rgb_array")

run_record(vdir=vdir,modelPath=modelPath,vdofileName=vdoFile, env=env, forHowLong= 15, freq=15)

show_video(vdir+vdoFile)
MoviePy - Building video ./videos/highway/PPO/ppo_highway_base.mp4.
MoviePy - Writing video ./videos/highway/PPO/ppo_highway_base.mp4

                                                                           
MoviePy - Done !
MoviePy - video ready ./videos/highway/PPO/ppo_highway_base.mp4

Hyperparameter Tuning #1

import gymnasium as gym
from stable_baselines3 import PPO
import torch.nn as nn

# Create the environment
env = gym.make("highway-fast-v0", render_mode="rgb_array")
obs, info = env.reset()

policy_kwargs = dict(
    activation_fn=nn.ReLU,  # Activation function
    net_arch=dict(pi=[256, 256, 128], vf=[256, 256, 128])  # Custom network architecture
)

# Create the PPO model with tuning parameters and custom network architecture
model = PPO(
    "MlpPolicy",
    'highway-fast-v0',
    learning_rate=3e-4,          # Base learning rate
    n_steps=4096,                # Number of steps to run for each environment per update
    batch_size=128,               # Batch size for each update
    n_epochs=20,                 # Number of epochs when optimizing the surrogate loss
    gamma=0.99,                  # Discount factor
    gae_lambda=0.95,             # Lambda for Generalized Advantage Estimation
    clip_range=0.2,              # Clipping parameter for the surrogate loss
    ent_coef=0.0,                # Coefficient for the entropy loss term
    vf_coef=0.7,                 # Value function loss coefficient
    verbose=0,
    tensorboard_log=tb_log_dir+'/hyper1',  # Tensorboard log directory for tuning analysis
    policy_kwargs=policy_kwargs  # Pass custom policy keyword arguments
)

# Train the model for a reduced number of timesteps for the preliminary experiment
model.learn(total_timesteps=int(2e4))
<stable_baselines3.ppo.ppo.PPO at 0x302200790>
# save the model in the specified directory
saved_model_dir = "../saved_models/PPO"
if not os.path.exists(saved_model_dir):
    os.makedirs(saved_model_dir)

# Save model with custom name
model.save(f"{saved_model_dir}/ppo_highway_hyper1.zip")

Display the logged data (from training) in tensorboard

 %reload_ext tensorboard

%load_ext tensorboard
%tensorboard --logdir './tensorboard_logs'
The tensorboard extension is already loaded. To reload it, use:
  %reload_ext tensorboard
Reusing TensorBoard on port 6010 (pid 54023), started 0:07:10 ago. (Use '!kill 54023' to kill it.)
# Evaluate the agent
from stable_baselines3.common.evaluation import evaluate_policy

# NOTE: If you use wrappers with your environment that modify rewards,
#       this will be reflected here. To evaluate with original rewards,
#       wrap environment in a "Monitor" wrapper before other wrappers.
mean_reward, std_reward = evaluate_policy(model, model.get_env(), n_eval_episodes=10)
print (f'{mean_reward}, {std_reward}')
20.820220999999997, 0.8717797887081347
import gymnasium as gym
vdoFile = "/ppo_highway_hyper1.mp4"
modelPath = '../saved_models/PPO/ppo_highway_hyper1.zip'
vdir = './videos/highway/PPO'
# Create a new environment for testing
env.reset()
env = gym.make("highway-v0", render_mode="rgb_array")

run_record(vdir=vdir,modelPath=modelPath,vdofileName=vdoFile, env=env, forHowLong= 15, freq=15)

show_video(vdir+vdoFile)
MoviePy - Building video ./videos/highway/PPO/ppo_highway_video.mp4.
MoviePy - Writing video ./videos/highway/PPO/ppo_highway_video.mp4

                                                                           
MoviePy - Done !
MoviePy - video ready ./videos/highway/PPO/ppo_highway_video.mp4

Post-Hyperparameter Tuning Analysis & Next Steps

After tuning the hyperparameters, we observed that while the value function improved significantly (higher explained variance and lower value loss), the overall policy performance declined. The hyperparameter-tuned model exhibited shorter episode lengths and lower rewards compared to the baseline PPO, suggesting that the agent struggled to generalize its decision-making effectively.

This was likely due to over-restrictive policy updates, as evidenced by:

Next Hyperparameter Updates & Hypothesis

For the next hyperparameter tuning, we will make the following changes:

  1. Reduce clip_range to 0.15 → Allows more flexibility in policy updates and reduces excessive clipping.
  2. Increase ent_coef to 0.01 → Encourages exploration and prevents premature convergence.
  3. Reduce learning_rate to 2e-4 → Slows down aggressive policy updates while maintaining learning stability.

Hypothesis:

With these adjustments, we expect the agent to:

Hyperparameter tuning #2

import gymnasium as gym
from stable_baselines3 import PPO
import torch.nn as nn

# Create the environment
env = gym.make("highway-fast-v0", render_mode="rgb_array")
obs, info = env.reset()

# Updated policy architecture and parameters
policy_kwargs = dict(
    activation_fn=nn.ReLU,  # Activation function
    net_arch=dict(pi=[256, 256, 128], vf=[256, 256, 128])  # Custom network architecture
)

# Create PPO model with improved hyperparameters
model = PPO(
    "MlpPolicy",
    'highway-fast-v0',
    learning_rate=2e-4,          # Reduced learning rate for more stable updates
    n_steps=4096,                # Number of steps per update
    batch_size=128,              # Batch size per update
    n_epochs=20,                 # More training iterations per update
    gamma=0.99,                  # Discount factor
    gae_lambda=0.95,             # Lambda for Generalized Advantage Estimation
    clip_range=0.15,             # Reduced clipping range for smoother updates
    ent_coef=0.01,               # Increased entropy coefficient to encourage exploration
    vf_coef=0.7,                 # Value function coefficient
    verbose=0,
    tensorboard_log=tb_log_dir+'/hyper2',  # Tensorboard log directory
    policy_kwargs=policy_kwargs  # Custom policy settings
)

# Train the model for an extended number of timesteps
model.learn(total_timesteps=int(5e4))  # Increased timesteps for better convergence

# Save the trained model
model.save(f"{saved_model_dir}/ppo_highway_hyper2.zip")
# Evaluate the agent
from stable_baselines3.common.evaluation import evaluate_policy

# NOTE: If you use wrappers with your environment that modify rewards,
#       this will be reflected here. To evaluate with original rewards,
#       wrap environment in a "Monitor" wrapper before other wrappers.
mean_reward, std_reward = evaluate_policy(model, model.get_env(), n_eval_episodes=10)
print (f'{mean_reward}, {std_reward}')
19.3868877, 4.25166624042857
%load_ext tensorboard
%tensorboard --logdir './tensorboard_logs'
The tensorboard extension is already loaded. To reload it, use:
  %reload_ext tensorboard
Reusing TensorBoard on port 6010 (pid 54023), started 0:32:38 ago. (Use '!kill 54023' to kill it.)
import gymnasium as gym
vdoFile = "/ppo_highway_hyper2.mp4"
modelPath = '../saved_models/PPO/ppo_highway_hyper2.zip'
vdir = './videos/highway/PPO'
# Create a new environment for testing
env.reset()
env = gym.make("highway-v0", render_mode="rgb_array")

run_record(vdir=vdir,modelPath=modelPath,vdofileName=vdoFile, env=env, forHowLong= 10, freq=20)

show_video(vdir+vdoFile)
MoviePy - Building video ./videos/highway/PPO/ppo_highway_hyper2.mp4.
MoviePy - Writing video ./videos/highway/PPO/ppo_highway_hyper2.mp4

                                                                           
MoviePy - Done !
MoviePy - video ready ./videos/highway/PPO/ppo_highway_hyper2.mp4

Hyperparameter Tuning Results: Iteration "hyper2"

The updated hyperparameter tuning (“hyper2”) showed improvements in the value function, with:

However, improvements in episode reward and length remain modest compared to the baseline. This suggests that:

Hyperparameter Tuning: Goals for "hyper3"

For the next tuning iteration ("hyper3"), our objective is to increase lane-switching efficiency, improve speed control, and balance risk-taking. The following changes have been implemented:

Expected Changes in "hyper3"

  1. More Decisive Lane Switching
    • Increase ent_coef to 0.02:
      Encourages bolder, more exploratory decisions.
    • Set vf_coef to 0.75:
      Enhances the value function’s estimation, helping the agent trust its lane-switching decisions.
  2. Better Speed Control
    • Reduce gamma to 0.97:
      This reduction shifts the focus toward immediate speed incentives, reducing over-cautious long-term planning.
  3. Encouraging Risk-Taking (While Staying Optimal)
    • The reward function is adjusted to favor higher speeds and overtaking behavior, allowing the agent to compete more effectively with other cars rather than lagging behind.

Additional Hyperparameters

Hypothesis

With these changes, we expect the agent to:

Hyperparameter Tuning #3

import gymnasium as gym
from stable_baselines3 import PPO
import torch.nn as nn

# Create the environment
env = gym.make("highway-fast-v0", render_mode="rgb_array")
obs, info = env.reset()

policy_kwargs = dict(
    activation_fn=nn.ReLU,  # Activation function
    net_arch=dict(pi=[256, 256, 128], vf=[256, 256, 128])  # Custom network architecture
)

# Create PPO model with refined hyperparameters
model = PPO(
    "MlpPolicy",
    'highway-fast-v0',
    learning_rate=2e-4,          # Stable learning rate
    n_steps=4096,                # Longer rollout for long-term decisions
    batch_size=128,              # Large batch size for gradient stability
    n_epochs=20,                 # More training epochs per update
    gamma=0.97,                  # Reduce long-term planning to favor speed & overtaking
    gae_lambda=0.95,             # GAE parameter for variance control
    clip_range=0.15,             # Slightly lower clip range for smooth policy updates
    ent_coef=0.02,               # More exploration to prevent hesitations
    vf_coef=0.75,                # Encourage better value function estimation
    verbose=0,
    tensorboard_log=tb_log_dir+'/hyper3',  # New TensorBoard log directory
    policy_kwargs=policy_kwargs  # Custom policy settings
)

# Train the model for more timesteps to allow better adaptation
model.learn(total_timesteps=int(6e4))  # Increased timesteps for stability

# Save the trained model
model.save(f"{saved_model_dir}/ppo_highway_hyper3.zip")
# Evaluate the agent
from stable_baselines3.common.evaluation import evaluate_policy

# NOTE: If you use wrappers with your environment that modify rewards,
#       this will be reflected here. To evaluate with original rewards,
#       wrap environment in a "Monitor" wrapper before other wrappers.
mean_reward, std_reward = evaluate_policy(model, model.get_env(), n_eval_episodes=10)
print (f'{mean_reward}, {std_reward}')
21.120220999999997, 0.8306623862918076
%tensorboard --logdir './tensorboard_logs'
Reusing TensorBoard on port 6010 (pid 55268), started 0:00:12 ago. (Use '!kill 55268' to kill it.)
import gymnasium as gym
vdoFile = "/ppo_highway_hyper3.mp4"
modelPath = '../saved_models/PPO/ppo_highway_hyper3.zip'
vdir = './videos/highway/PPO'
# Create a new environment for testing
env.reset()
env = gym.make("highway-v0", render_mode="rgb_array")

run_record(vdir=vdir,modelPath=modelPath,vdofileName=vdoFile, env=env, forHowLong= 10, freq=20)

show_video(vdir+vdoFile)
MoviePy - Building video ./videos/highway/PPO/ppo_highway_hyper3.mp4.
MoviePy - Writing video ./videos/highway/PPO/ppo_highway_hyper3.mp4

                                                                           
MoviePy - Done !
MoviePy - video ready ./videos/highway/PPO/ppo_highway_hyper3.mp4

PPO Hyperparameter Tuning: Hyper3 Analysis vs. Previous Models

Performance Comparison

After tuning the hyperparameters for the third iteration (Hyper3), we observed a few notable changes in training dynamics and performance. Compared to the Baseline PPO (PPO_1), Hyper1, and Hyper2, Hyper3 demonstrates improved episode length and reward stability, but its real-world performance remains suboptimal, especially in terms of speed control, lane-switching confidence, and overtaking behavior.

Key Observations

Next Steps: Testing in Merge Environment & Implementing a Custom Reward Wrapper

Since Hyper3 does not show clear improvements in lane-switching and overtaking, the next step is to:

  1. Test the model in the Merge Environment
    • This will assess how well the agent handles entering and exiting highway ramps.
    • If similar hesitation and speed issues persist, it confirms that the policy itself needs further tuning.
  2. Introduce a Custom Reward Wrapper to Encourage Speed & Overtaking
    • Add a speed-based reward to incentivize higher velocity.
    • Implement a proximity-based overtaking reward, so the agent learns to overtake instead of staying behind slower vehicles.
    • Introduce a penalty for unnecessary lane-switching, reducing hesitation and encouraging more decisive maneuvers.

The next steps will involve analyzing the model's behavior in the Merge Environment and designing a custom reward structure to improve real-world driving performance.

import gymnasium as gym
vdoFile = "/ppo_merge_hyper3.mp4"
modelPath = '../saved_models/PPO/ppo_highway_hyper3.zip'
vdir = './videos/merge/PPO'
# Create a new environment for testing
env.reset()
env = gym.make("merge-v0", render_mode="rgb_array")

run_record(vdir=vdir,modelPath=modelPath,vdofileName=vdoFile, env=env, forHowLong= 10, freq=20)
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overTrue
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overTrue
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overTrue
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overTrue
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashTrue
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overTrue
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overTrue
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overTrue
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overTrue
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overTrue
MoviePy - Building video ./videos/merge/PPO/ppo_merge_hyper3.mp4.
MoviePy - Writing video ./videos/merge/PPO/ppo_merge_hyper3.mp4

                                                                         
MoviePy - Done !
MoviePy - video ready ./videos/merge/PPO/ppo_merge_hyper3.mp4

show_video(vdir+vdoFile)

Env tuning

import gymnasium as gym
import numpy as np
from gymnasium import RewardWrapper

class CustomHighwayReward(RewardWrapper):
    def __init__(self, env):
        super().__init__(env)
        self.prev_lane = None  # Track lane changes

    def reward(self, reward):
        """Modify reward based on speed, overtaking, and lane-switching efficiency."""
        
        # Get the ego vehicle's state
        ego_vehicle = self.unwrapped.vehicle
        speed = ego_vehicle.speed  # Get vehicle speed
        lane_index = ego_vehicle.lane_index[2]  # Extract lane position

        # Reward for maintaining high speed
        speed_bonus = speed / 30  # Normalize speed to max ~30

        # Encourage overtaking: if the agent is moving faster than others nearby
        num_overtakes = sum(1 for v in self.unwrapped.road.vehicles if v.position[0] < ego_vehicle.position[0] and v.speed < ego_vehicle.speed)
        overtake_bonus = num_overtakes * 0.5  # Reward per successful overtake

        # Penalty for unnecessary lane-switching
        lane_switch_penalty = -0.2 if self.prev_lane is not None and self.prev_lane != lane_index else 0

        # Update previous lane for next step comparison
        self.prev_lane = lane_index

        # Compute final reward
        new_reward = reward + speed_bonus + overtake_bonus + lane_switch_penalty

        return new_reward

# Wrap the environment
env = gym.make("highway-fast-v0", render_mode="rgb_array")
env = CustomHighwayReward(env)
obs, info = env.reset()
# Define the policy network structure (same as before to maintain learning consistency)
policy_kwargs = dict(
    activation_fn=nn.ReLU,
    net_arch=dict(pi=[256, 256, 128], vf=[256, 256, 128]) 
)

# Load the pretrained model from `hyper3`
model = PPO.load(f"{saved_model_dir}/ppo_highway_hyper3.zip", env=env, custom_objects={"policy_kwargs": policy_kwargs})

# Continue training in the new environment
model.learn(
    total_timesteps=int(4e4),  # Fine-tune with 40,000 steps
    reset_num_timesteps=False  # Do NOT reset timesteps (continue from previous training)
)

# Save the fine-tuned model
model.save(f"{saved_model_dir}/ppo_merge_hyper4.zip")
mean_reward, std_reward = evaluate_policy(model, model.get_env(), n_eval_episodes=10)
print (f'{mean_reward}, {std_reward}')
43.6517909, 4.665951590621669

vdoFile = "/ppo_merge_hyper4.mp4"
modelPath = '../saved_models/PPO/ppo_merge_hyper4.zip'
vdir = './videos/merge/PPO'
# Create a new environment for testing
env.reset()
env = gym.make("merge-v0", render_mode="rgb_array")

run_record(vdir=vdir,modelPath=modelPath,vdofileName=vdoFile, env=env, forHowLong= 10, freq=20)
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashTrue
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overTrue
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overTrue
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overTrue
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overTrue
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overTrue
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overTrue
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overTrue
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashTrue
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overTrue
MoviePy - Building video ./videos/merge/PPO/ppo_merge_hyper4.mp4.
MoviePy - Writing video ./videos/merge/PPO/ppo_merge_hyper4.mp4

                                                                         
MoviePy - Done !
MoviePy - video ready ./videos/merge/PPO/ppo_merge_hyper4.mp4

show_video(vdir+vdoFile)

Fine-Tuning PPO in Merge Environment: Observations & Next Steps

Observations from Merge Environment Testing

After fine-tuning the previously trained model in the highway-merge-v0 environment, the agent's performance has improved in several key areas:

Despite these improvements, further testing is needed to assess the model’s performance in more complex environments.

Next Steps: Testing in Roundabout Environment

The next phase of evaluation will involve testing the fine-tuned PPO model in the highway-roundabout-v0 environment. This test will focus on:

The upcoming test will determine if additional tuning or reward modifications are necessary to handle roundabout navigation challenges effectively.

vdoFile = "/ppo_rounabout_hyper4.mp4"
modelPath = '../saved_models/PPO/ppo_merge_hyper4.zip'
vdir = './videos/roundabout/PPO'
# Create a new environment for testing
env.reset()
env = gym.make("roundabout-v0", render_mode="rgb_array")

run_record(vdir=vdir,modelPath=modelPath,vdofileName=vdoFile, env=env, forHowLong= 10, freq=20)
MoviePy - Building video ./videos/roundabout/PPO/ppo_rounabout_hyper4.mp4.
MoviePy - Writing video ./videos/roundabout/PPO/ppo_rounabout_hyper4.mp4

                                                                         
MoviePy - Done !
MoviePy - video ready ./videos/roundabout/PPO/ppo_rounabout_hyper4.mp4

show_video(vdir+vdoFile)

Roundabout Environment Testing: Observations & Next Steps

Observations from Roundabout Testing

After testing the trained PPO model in the highway-roundabout-v0 environment, several key issues were identified:

The observed behavior suggests that the agent over-prioritizes safety, leading to an overly cautious and inactive policy. This prevents it from effectively participating in traffic flow.

Planned Fixes Before Running the Next Training

To address these issues, the next iteration (hyper5) will include:

  1. Reward function modifications to encourage more confident driving:
    • Introduce a bonus for successfully entering the roundabout.
    • Reward higher speeds while maintaining safety.
    • Penalize standing still for too long to prevent hesitation.
    • Reduce harsh penalties for nearby vehicles to encourage safe merges.
  2. Hyperparameter adjustments to improve policy adaptation:
    • Increase entropy coefficient (ent_coef=0.03) to encourage more exploration.
    • Reduce gamma (gamma=0.95) to prioritize short-term gains like merging effectively.

Next Steps

import gymnasium as gym
from stable_baselines3 import PPO
import torch.nn as nn

# Custom Reward Wrapper for Roundabout Environment
class CustomRoundaboutReward(RewardWrapper):
    def __init__(self, env):
        super().__init__(env)
        self.prev_lane = None

    def reward(self, reward):
        """Modify reward for roundabout navigation."""
        
        ego_vehicle = self.unwrapped.vehicle
        speed = ego_vehicle.speed
        lane_index = ego_vehicle.lane_index[2]

        # Encourage entering the roundabout (reward for leaving the entrance lane)
        enter_bonus = 1.0 if lane_index != 0 else 0

        # Encourage maintaining a reasonable speed
        speed_bonus = (speed / 30) * 0.5  # Reduced impact to balance safety

        # Slight penalty for standing still too long
        idle_penalty = -0.3 if speed < 1.0 else 0

        # Reduce lane-switch hesitation penalty (allow more flexibility)
        lane_switch_penalty = -0.1 if self.prev_lane is not None and self.prev_lane != lane_index else 0

        # Update lane memory
        self.prev_lane = lane_index

        # Compute final reward
        new_reward = reward + enter_bonus + speed_bonus + idle_penalty + lane_switch_penalty

        return new_reward

# Create and wrap the environment
env = gym.make("roundabout-v0", render_mode="rgb_array")
env = CustomRoundaboutReward(env)
# Load previously trained model (`hyper4`)
previous_model = PPO.load(f"{saved_model_dir}/ppo_merge_hyper4.zip", env=env)

# Extract policy and policy weights from the trained model
policy_kwargs = dict(
    activation_fn=nn.ReLU,
    net_arch=dict(pi=[256, 256, 128], vf=[256, 256, 128]) 
)

# Reinitialize PPO with updated hyperparameters **but keep trained policy, critic, and optimizer**
model = PPO(
    "MlpPolicy",
    env,
    learning_rate=5e-4,  # Reduce learning rate for more stable updates
    gamma=0.8,  # Slightly higher gamma to prevent extreme short-term thinking
    tensorboard_log=tb_log_dir+'/PPO_roundabout_hyper5',  # New log directory
    verbose=0,
    ent_coef=0.03,  # Slightly higher entropy coefficient for more exploration
    policy_kwargs=policy_kwargs,  # Maintain previous architecture
)

# **Load policy & critic weights from previous training**
model.policy.load_state_dict(previous_model.policy.state_dict())
model.policy.value_net.load_state_dict(previous_model.policy.value_net.state_dict())

# **Load optimizer state** (this ensures learning doesn't restart from scratch)
model.policy.optimizer.load_state_dict(previous_model.policy.optimizer.state_dict())

# Continue training with the new hyperparameters
model.learn(
    total_timesteps=int(4e4),  # Fine-tune for another 40,000 steps
    reset_num_timesteps=False  # Continue from previous learning
)

# Save the fine-tuned model
model.save(f"{saved_model_dir}/ppo_roundabout_hyper5.zip")
%tensorboard --logdir './tensorboard_logs'
Reusing TensorBoard on port 6010 (pid 56985), started 0:00:01 ago. (Use '!kill 56985' to kill it.)
vdoFile = "/ppo_rounabout_hyper5.mp4"
modelPath = '../saved_models/PPO/ppo_roundabout_hyper5.zip'
vdir = './videos/roundabout/PPO'
# Create a new environment for testing
env.reset()
env = gym.make("roundabout-v0", render_mode="rgb_array")

run_record(vdir=vdir,modelPath=modelPath,vdofileName=vdoFile, env=env, forHowLong= 10, freq=20)
MoviePy - Building video ./videos/roundabout/PPO/ppo_rounabout_hyper5.mp4.
MoviePy - Writing video ./videos/roundabout/PPO/ppo_rounabout_hyper5.mp4

                                                                         
MoviePy - Done !
MoviePy - video ready ./videos/roundabout/PPO/ppo_rounabout_hyper5.mp4

show_video(vdir+vdoFile)

Final Hyperparameter Tuning: Hyper6

Key Improvements from Hyper6

To fully eliminate hesitation and improve merging confidence, the following changes were made:

Enhancements in Reward Function

** Hyperparameter Adjustments**

⏭ Next Steps

This is the final fine-tuning step before testing real-world-like performance.

import gymnasium as gym
from stable_baselines3 import PPO
import torch.nn as nn
from gymnasium import RewardWrapper

#  Custom Reward Wrapper for Roundabout Environment
class CustomRoundaboutReward(RewardWrapper):
    def __init__(self, env):
        super().__init__(env)
        self.prev_lane = None
        self.entry_timer = 0  # Track how long the agent hesitates at entry

    def reward(self, reward):
        """Modify reward for roundabout navigation."""

        ego_vehicle = self.unwrapped.vehicle
        speed = ego_vehicle.speed
        lane_index = ego_vehicle.lane_index[2]

        #  Encourage entering the roundabout (reward for leaving the entrance lane)
        enter_bonus = 1.5 if lane_index != 0 else -0.2  # Penalty for staying at entry too long

        #  Encourage maintaining a reasonable speed
        speed_bonus = (speed / 30) * 0.5  # Reward for steady speeds

        #  Stronger penalty for idling too long before entering
        if lane_index == 0:  # Still at entry?
            self.entry_timer += 1
            idle_penalty = -0.5 if self.entry_timer > 30 else 0  # Hesitation penalty
        else:
            self.entry_timer = 0  # Reset timer if agent enters
            idle_penalty = 0  # Ensure idle_penalty is defined

        #  Reduce lane-switch hesitation penalty
        lane_switch_penalty = -0.1 if self.prev_lane is not None and self.prev_lane != lane_index else 0
        self.prev_lane = lane_index  # Update lane memory

        #  Compute final reward
        new_reward = reward + enter_bonus + speed_bonus + idle_penalty + lane_switch_penalty
        return new_reward

#  Create and wrap the environment
env = gym.make("roundabout-v0", render_mode="rgb_array")
env = CustomRoundaboutReward(env)

#  Load previously trained model (`Hyper5`)
previous_model = PPO.load(f"{saved_model_dir}/ppo_roundabout_hyper5.zip", env=env)

#  Maintain architecture but update hyperparameters
policy_kwargs = dict(
    activation_fn=nn.ReLU,
    net_arch=dict(pi=[256, 256, 128], vf=[256, 256, 128]) 
)

#  Final Hyper6 Model Adjustments
model = PPO(
    "MlpPolicy",
    env,
    learning_rate=6e-4,  # Faster adaptation
    gamma=0.98,  # Planning further ahead
    ent_coef=0.015,  # Reduce hesitation, commit to decisions faster
    clip_range=0.1,  # More stable updates
    tensorboard_log=tb_log_dir+'/PPO_roundabout_hyper6',
    verbose=0,
    policy_kwargs=policy_kwargs,
)

#  Load previous policy & value function weights
model.policy.load_state_dict(previous_model.policy.state_dict())
model.policy.value_net.load_state_dict(previous_model.policy.value_net.state_dict())
model.policy.optimizer.load_state_dict(previous_model.policy.optimizer.state_dict())

#  Continue training with updated parameters
model.learn(
    total_timesteps=int(5e4),  # Training for 50,000 steps
    reset_num_timesteps=False
)

#  Save the fine-tuned model
model.save(f"{saved_model_dir}/ppo_roundabout_hyper6.zip")
%tensorboard --logdir './tensorboard_logs'
Reusing TensorBoard on port 6010 (pid 56985), started 0:37:52 ago. (Use '!kill 56985' to kill it.)
vdoFile = "/ppo_rounabout_hyper6.mp4"
modelPath = '../saved_models/PPO/ppo_roundabout_hyper6.zip'
vdir = './videos/roundabout/PPO'
# Create a new environment for testing
env.reset()
env = gym.make("roundabout-v0", render_mode="rgb_array")

run_record(vdir=vdir,modelPath=modelPath,vdofileName=vdoFile, env=env, forHowLong= 10, freq=20)
MoviePy - Building video ./videos/roundabout/PPO/ppo_rounabout_hyper6.mp4.
MoviePy - Writing video ./videos/roundabout/PPO/ppo_rounabout_hyper6.mp4

                                                                         
MoviePy - Done !
MoviePy - video ready ./videos/roundabout/PPO/ppo_rounabout_hyper6.mp4